-
Notifications
You must be signed in to change notification settings - Fork 16
NETOBSERV-2443 fix bug, improve cleanup and writing files #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #404 +/- ##
==========================================
- Coverage 13.84% 13.82% -0.02%
==========================================
Files 18 18
Lines 2731 2734 +3
==========================================
Hits 378 378
- Misses 2329 2332 +3
Partials 24 24
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/test ? |
|
@memodi: The following commands are available to trigger required jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test integration-tests |
|
/test integration-tests |
|
integration tests are failing because for some reason CI cluster is taking too long to pull images. /test integration-tests |
|
/test integration-tests |
1 similar comment
|
/test integration-tests |
- Increase waitDaemonset timeout from 50s to 5 minutes (30×10s) * CI environments often have slow image pulls * Previous timeout was too aggressive for registry operations - Add comprehensive diagnostic output on pod startup failure: * Pod status with node placement (get pods -o wide) * Recent events to identify ImagePullBackOff, etc * Pod event details from describe output * Daemonset logs if containers started This helps diagnose ContainerCreating issues in CI where pods fail to start due to image pull problems or resource constraints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
CI runs showed 4/6 pods ready with 5 minute timeout, indicating image pulls need more time. Increasing to 10 minutes (60×10s) to accommodate slower CI registry pulls and pod scheduling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
In E2E test mode, the bash script's waitDaemonset() could exit with error after 10 minutes while the Go test's isDaemonsetReady() was still polling. This created a race where: 1. Go test calls StartCommand() which runs bash script async 2. Bash script calls waitDaemonset() and waits 10 mins 3. Go test calls isDaemonsetReady() and waits 10 mins 4. If bash times out first, it calls exit 1, killing the process 5. Go test is left polling a dead command Solution: When isE2E=true, skip the bash-level wait since the Go test framework handles pod readiness checking via isDaemonsetReady(). For manual CLI usage (isE2E=false), the wait still runs as before to provide user feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Tests were failing because: 1. Commands ran with --max-time=1m in foreground mode 2. After 1 minute, capture finished and auto-cleanup ran 3. Cleanup deleted the daemonset 4. isDaemonsetReady() was polling for a deleted daemonset 5. Test failed with context deadline exceeded Using --background mode prevents automatic cleanup when the capture finishes, allowing the test to verify daemonset privilege settings before cleanup runs. Also, Check for CLI is running instead of just daemnset. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
@memodi: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
NETOBSERV-2443 fix bug, improve cleanup and writing files
With the help of Claude, I was able to identify the flakiness coming from pty and made bunch of improvements as below:
Made several runs, now CLI tests are much stable.
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.